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(~| Abstract. Probabilistic model checking mainly concentrates on tech- 

niques for reasoning about the probabilities of certain path properties or 

' ' expected values of certain random variables. For the quantitative system 

analysis, however, there is also another type of interesting performance 
measure, namely quantiles. A typical quantile query takes as input a 

lower probability bound p G ]0, 1] and a reachability property. The task is 

Othen to compute the minimal reward bound r such that with probability 
at least p the target set will be reached before the accumulated reward 
I— I exceeds r. Quantiles are well-known from mathematical statistics, but to 

(/3 the best of our knowledge they have not been addressed by the model 

O checking community so far. 

In this paper, we study the complexity of quantile queries for until 
properties in discrete-time finite-state Markov decision processes with 
^ nonnegative rewards on states. We show that qualitative quantile queries 

QQ can be evaluated in polynomial time and present an exponential algorithm 

T-H for the evaluation of quantitative quantile queries. For the special case 

OO of Markov chains, we show that quantitative quantile queries can be 

evaluated in pseudo-polynomial time. 

o 

1 Introduction 

^ Markov models with reward (or cost) functions are widely used for the quantitative 

; ""j system analysis. We focus here on the discrete-time or time-abstract case. Discrete- 

rS time Markov decision processes, MDPs for short, can be used, for instance, as 

J3 an operational model for randomised distributed algorithms and rewards might 

serve to reason, e.g., about the size of the buffer of a communication channel or 
about the number of rounds that a leader election protocol might take until a 
leader has been elected. 

Several authors considered variants of probabilistic computation tree logic 
(PCTL) [12 4 for specifying quantitative constraints on the behaviour of Markov 
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models with reward functions. Such extensions, briefly caUed PRCTL here, 
permit to specify constraints on the probabihties of reward-bounded reachabihty 
conditions, on the expected accumulated rewards until a certain set of target 
states is reached or expected instantaneous rewards after some fixed number of 
steps |7I6I9I1I15] , or on long-run averages [H] . An example for a typical PRCTL 
formula with PCTL's probability operator and the reward-bounded until operator 
is the formula P>p(a U<f. b) where p is a lower probability bound in [0, 1[ and 
r is an upper bound for the accumulated reward earned by path fragments 
that lead via states where a holds to a fo-state. From a practical point of view, 
more important than checking whether a given PRCTL formula ip holds for 
(the initial state of) a Markov model Ai are PRCTL queries of the form P^? ip 
where the task is to calculate the (minimum or maximum) probability for the 
path formula ip. Indeed, the standard PRCTL model checking algorithm checks 
whether a given formula Ptxip holds in M by evaluating the PRCTL query 
P=? Ip and comparing the computed value q with the given probability bound p 
according to the comparison predicate cxi. The standard procedure for dealing 
with PRCTL formulas that refer to expected (instantaneous or accumulated) 
rewards relies on an analogous scheme; see e.g. [lOj . An exception can be made 
for qualitative PRCTL properties P^^p ip where the probability bound p is either 
or 1, and the path formula ip is a plain until formula without reward bound (or 
any w-regular path property without reward constraints): in this case, a graph 
analysis suffices to check whether P|xip V' holds for A4 |16l5j . 

In a common project with the operating system group of our department, we 
learned that a natural question for the systems community is to swap the given 
and unknown parameters in PRCTL queries and to ask for the computation of 
a quantile (see [2]). For instance, if A4 models a mutual exclusion protocol for 
competing processes Pi , . . . , P„ and rewards are used to represent the time spent 
by process Pi in its waiting location, then the quantile query P>o.9(wazii U<? criU) 
asks for the minimal time bound r such that in all scenarios (i.e., under all 
schedulers) with probability greater than 0.9 process Pi will wait no longer than 
r time units before entering its critical section. For another example, suppose 
A4 models the management system of a service execution platform. Then the 

query P>o. 98(^^6 U<7 tasks completed) might ask for the minimal initial energy 

budget r that is required to ensure that even in the worst-case there is more than 
98% chance to reach a state where all tasks have been completed successfully. 

To the best of our knowledge, quantile queries have not yet been addressed 
directly in the model checking community. What is known from the literature 
is that for finite Markov chains with nonnegative rewards the task of checking 
whether a PRCTL formula P>p(a U<,. b) or P>p{a U<r b) holds for some given 
state is N P-hard |14j when p and r are represented in binary. Since such a formula 
holds in state s if and only if the value of the corresponding quantile query at s 
is < r, this implies that evaluating quantile queries is also N P-hard. 

The purpose of this paper is to study quantile queries for Markov decision 
processes with nonnegative rewards in more details. We consider quantile queries 
for reward-bounded until formulas in combination with the standard PRCTL 
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quantifier Pixjp (in this paper denoted by VP[><]p), where universal quantification 
over all schedulers is inherent in the semantics, and its dual BP^^p that asks 
for the existence of some scheduler enjoying a certain property. By duality, our 
results carry over to reward-bounded release properties. 

Contributions. First, we address qualitative quantile queries, i.e. quantile 
queries where the probability bound is either or 1, and we show that such 
queries can be evaluated in strongly polynomial time. Our algorithm is surprisingly 
simple and does not rely on value iteration or linear programming techniques (as 
it is e.g. the case for extremal expected reachability times and stochastic shortest- 
paths problems in MDPs ^). Instead, our algorithm relies on the greedy method 
and borrows ideas from Dijkstra's shortest-path algorithm. In particular, our 
algorithm can be used for checking PRCTL formulas of the form VP[><ip(a U<r b) or 
3P|x]p(a U<r b) with p £ {0, 1} in polynomial time. Previously, a polynomial-time 
algorithm was known only for the special case of MDPs where every loop contains 
a state with nonzero reward |13| . 

Second, we consider quantitative quantile queries. The standard way to com- 
pute the maximal or minimal probabilities for reward-bounded until properties, 
say aU<,-6, relies on the iterative computation of the extremal probabilities aU<i6 
for increasing reward bound i. We use here a reformulation of this computation 
scheme as a linear program whose size is polynomial in the number of states 
of M and the given reward bound r. The crux to derive from this linear program 
an algorithm for the evaluation of quantile queries is to provide a bound for 
the sought value, which is our second contribution. This bound then permits 
to perform a sequential search for the quantile, which yields an exponentially 
time-bounded algorithm for evaluating quantitative quantile queries. Finally, 
in the special case of Markov chains with integer rewards, we show that this 
algorithm can be improved to run in time polynomial in the size of the query, 
the size of the chain, and the largest reward, i.e. in pseudo-polynomial time. 

Outline. The structure of the paper is as follows. Section [2] summarises the 
relevant concepts of Markov decision processes and briefly recalls the logic PRCTL. 
Quantile queries are introduced in Sect . |3] Our polynomial-time algorithms for 
qualitative quantile queries is presented in Sect.|4] whereas the quantitative case 
is addressed in Sect. [5] The paper ends with some concluding remarks in Sect. [6] 

2 Preliminaries 

In the following, we assume a countably infinite set AP of atomic propositions. 
A Markov decision process (MDP) Ai = {S, Act, 7, A, rew, 5) with nonnegative 
rewards consists of a finite set S of states, a finite set Act of actions, a function 
7: S* — >■ 2"*'^* \ {0} describing the set of enabled actions in each state, a labeUing 
function A: S — )■ 2^^^, a reward function rew: S — > M-'^, and a transition function 
d: S X Act X 5 — > [0, 1] such that X^teS '^('^' ct^t) = I for all s e S' and a € Act. 
If the set Act of actions is just a singleton, we call A4 a Markov chain. 
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Given an MDP we say that a state s of is absorbing if S{s, a,s) — 1 for 
all a e 7(s). Moreover, for a G AP we denote by A^^(a) the set of states s such 
that a G A(s), and for x = SqSi ... S 5* we denote by rew(x) the accumulated 
reward after x, i.e. rew{x) — X]j:=o ^£w{si). Finally, we denote by \S\ the number 
of nontrivial transitions in M, i.e. \5\ — \{{s,a,t) : a S 7(3) and S{s,a,t) > 0}|. 

Schedulers are used to resolve the nondeterminism that arises from the 
possibility that more than one action might be enabled in a given state. Formally, 
a scheduler for is a mapping a: Act such that a{xs) G j{s) for all 

X € S* and s d S. Such a scheduler a is memoryless if (t(xs) = (t(s) for all 
a; G S"* and s € S. Given a scheduler a and an initial state s = sq, there is 
a unique probability measure Pr^ on the Borel a-algebra over S'" such that 
PrJ(soSi . . . Sfc • S'") = Yii^o S{si,a{so . . . Si),Sj+i); see [3]. 

Several logics have been introduced in order to reason about the probability 
measures Pr^. In particular, the logics PCTL and PCTL* replace the path 
quantifiers of CTL and CTL* by a single probabilistic quantifier Pc<,p, where 
[XI G {<, <, >, >} and p G [0, 1]. In these logics, the formula ip = P^p tp holds in 
state s (written s |= (/j) if under all schedulers a the probability Pr^ (■(/;) of the 
path property ip compares positively with p wrt. the comparison operator [xi, i.e. 
if Pr^ (■(/') M -0. A dual existential quantifier 3Pp<sp that asks for the existence of 
a scheduler can be introduced using the equivalence 3P|xip ip = ^P^p V'; where 
K denotes the dual inequality. Since many properties of MDPs can be expressed 
more naturally using the i3P quantifier, we consider this quantifier an equal citizen 
of the logic, and we denote the universal quantifier P by VP in order to stress its 
universal semantics. 

In order to be able to reason about accumulated rewards, we amend the until 
operator U by a reward constraint of the form ~ r, where ~ is a comparison 
operator and r G K U {±cxd}. Since we adopt the convention that a reward is 
earned upon leaving a state, a path tt = sqSi . . . fulfils the formula ^1 U^r '02 if 
there exists a point fc G N such that 1. s^Sk+i ■ • ■ |= V'2, 2. SiSi^i ... \= ipi for 
all i < k, and 3. rew{so . . . Sk-i) ~ r. Even though our logic is only a subset of 
the logics PRCTL and PRCTL* defined in [T, we use the same names for the 
extension of PCTL and PCTL* with the amended until operator. The following 
proposition states that extremal probabilities for PRCTL* are attainable. This 
follows, for instance, from the fact that PRCTL* can only describe w-regular 
path properties. 

Proposition 1. Let A4 be an MDP and -0 a PRCTL* path formula. Then there 
exist schedulers a* and r* such that Pr^ (0) — sup^,. Pr^ (-0) and Pr^ (ip) = 
inf^Pr^(-0) for all states s of A4. 

3 Quantile queries 

A quantile query is of the form (p — VP[>^p(a U<7 5) or (/? = 3P[xip(a U<7 5), where 
a, & G AP, p G [0, 1] and [X G {<, <, >, >}. We call queries of the former type 
universal and queries of the latter type existential. If r G K U {±00}, we write 
ip[r] for the PRCTL formula that is obtained from ip by replacing ? with r. 



4 



Given an MDP A4 with rewards, evaluating 1^9 on amounts to computing, 
for each state s of A^, the least or the largest r G M such that s \= f[r]. Formally, 
if if = VP[xp(a U<? b) or ip = 3Pc^p{a U<7 b) then the value of a state s of 7W 
with respect to ip is val^(s) := opt{r e M : s \= ^[r]}, where opt — inf if 
[x: e {>, >} and opt = sup otherwisej^ Depending on whether val^(s) is defined 
as an infimum or a supremum, we call if a minimising or a maximising query, 
respectively. In the following, we will omit the superscript M when the underlying 
MDP is clear from the context. 

Given a query ip, we define the dual query to be the unique quantile query Tp 
such that ^[r] = ^ip[r] for all r S K U {±00}. Hence, to form the dual of a 
query, one only needs to replace the quantifier VP^p by 3Pp^p and vice versa. 
For instance, the dual of VP<p(a U<? b) is 3P>p(a U<7 b). Note that the dual of a 
universal or minimising query is an existential or maximising query, respectively, 
and vice versa. 

Proposition 2. Let M be an MDP and p> a quantile query. Then valip(s) = 
val^(s) /or all states s of M. 

Proof. Without loss of generality, assume that is a minimising query. Let s G S, 
V = valip(s) and v' — val^(s). On the one hand, for all r < w we have s ^ ^[r], 
i.e. s \= ^[r], and therefore v' > v. On the other hand, since if[r] implies p>[r'] for 
r' > r, for all r > w we have s 1= ip[r], i.e. s ^ ^[r], and therefore also v' < v. □ 

Assume that we have computed the value val<^(s) of a state s with respect 
to a quantile query ip. Then, for any r e M, to decide whether s ^ '^[r], we just 
need to compare r to val;^(s). 

Proposition 3. Let M be an MDP, s a state of M, ip a minimising or max- 
imising quantile query, and r G M. Then s \= ip[r] if and only val^(s) < r or 
val<p(s) > r, respectively. 

Proof. First assume that ip = Q{a U<7 b) is a minimizing query. Clearly, if 
s 1= fir], then val^(s) < r. On the other hand, assume that vali^(s) < r and 
denote by R the set of numbers a; G M of the form x = X]i=o ''"sw{si) for a finite 
sequence sqSi . . . Sk of states. Since the set {x G i? : a; < n} is finite for all n G N, 
we can fix some e > such that r + S ^ R for all < S < e. Hence, the set of 
paths that fulfil a U<r b agrees with the set of paths that fulfil a U<,.+£ b. Since 
val^(s) < r + e and is a minimising query, we know that s \= ip[r + s]. Since 
replacing r + e by r does not affect the path property, this implies that s \= ip[r]. 
Finally, if iy9 is a maximising query, then is a minimising query, and s |= ^[r] if 
and only if val^(s) = val^(s) < r, i.e. s \= ip[r] if and only if valy(s) > r. □ 

Proposition [3] does not hold when we allow r to take an infinite value. In fact, 
if is a minimizing query and s ^ then yal^{s) = 00. Analagously, if tp is 

a maximising query and s ^ cx)], then val(^(s) — —00. 

To conclude this section, let us remark that queries using the reward-bounded 
release operator R can easily be accommodated in our framework. For instance, 
the query VP>p(a R<? b) is equivalent to the query VP<i_p(^a U<7 ^6). 

^ As usual, we assume that inf = 00 and sup0 = —00. 
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Algorithm 1 Solving qualitative queries of the form Q{a U<? b) 

Input: MDP M = {S, Act, 7, A, rew, 5), ip = Q{a U<? b) 

for each s £ S do 

if s 1= 6 then v{s) ^ else v{s) 4— 00 
X ^{se S: v{s) = 0}; i?^ {0} 
Z -(^ {s e S : s \= a A and rew{s) = 0} 
while i? / do 

r- ^ min i? ; F ^ {s G X : i;(s) < r-} \ ^ 
for each sG S\X with s ^ a A QX{Z U F) do 
i;(s) r + rew{s) 
X ^ Xu{s}; Ru{v{s)} 
R\{r} 
return v 



4 Evaluating qualitative queries 

In this section, we give a strongly polynomial-time algorithm for evaluating 
qualitative queries, i.e. queries where the probability bound p is either or 1. 
Throughout this section, let M = {S, Act, 7, A, rew, 6) be an MDP with non- 
negative rewards. By Proposition [2] we can restrict to queries using one of the 
quantifiers VP>o, 3P>o, VP=i and 3P=i. The following lemma allows to give a 
unified treatment of all cases. (X denotes the next-step operator). 

Lemma 4. The equivalence QX(a U (^a A ip)) = QX(a U (^a A Qip)) holds in 
PRCTL* forallQ e {VP>o, 3P>o, VP=i, 3P=i}, a e AP, and all path formulas ^ . 

Algorithm [l] is our algorithm for computing the values of a quantile query 
where we look for an upper bound on the accumulated reward. The algorithm 
maintains a set X of states, a set R of real numbers, and a table v mapping states 
to non-negative real numbers or infinity. The algorithm works by discovering 
states with finite value repeatedly until only the states with infinite value remain. 
Whenever a new state is discovered, it is put into X and its value is put into R. In 
the initialisation phase, the algorithm discovers all states labelled with b, which 
have value 0. In every iteration of the main loop, new states are discovered by 
picking the least value r that has not been fully processed (i.e. the least element 
of R) and checking which undiscovered a-labelled states fulfil the PCTL* formula 
QX{Z U Y), where Y is the set of already discovered states whose value is at 
most r and Z is the set of states labelled with a but not with b and having 
reward 0. Any such newly discovered state s must have value r -|- rew{s), and 
r can be deleted from R at the end of the current iteration. The termination of 
the algorithm follows from the fact that in every iteration of the main loop either 
the set X increases or it remains constant and one element is removed from R. 

Lemma 5. Let Ai he an MDP, ip — Q{a U<7 b) a qualitative query, and let v be 
the result of AlgorithmUj on Ai and ip. Then v{s) — valip{s) for all states s. 
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Proof. We first prove that s |= (/9[f (s)] for all states s with v{s) < oo. Hence, v is 
an upper bound on val^. We prove this by induction on the number of iterations 
the while loop has performed before assigning a finite value to v{s). Note that 
this is the same iteration when s is put into X and that v{s) never changes 
afterwards. If s is put into X before the first iteration, then s \= b and therefore 
also s \= (p[0] — (p[v{s)]. Now assume that the while loop has already completed 
i iterations and is about to add s to X in the current iteration; let X, r and Y 
be as at the beginning of this iteration (after r and Y have been assigned, but 
before any new state is added to X). By the induction hypothesis, t \= ip[r] for all 
t (zY. Since s is added to X, we have that s \^ a A QX{Z U Y). Using Lemma |4] 
and some basic PRCTL* laws, we can conclude that s |= (y9[u(s)] as follows: 

s haAQX(ZUr) 

s^aAQX{ZU{^Z AQ{aU<rb))) 
=^ s^aAQX{ZU{^Z A{aU<rb))) 
=^ s \= a AQX{aU<rb) 

=^ S 1= Q{a U<r+rew(s) b) 

=^ s h ip[v{s)] 

To complete the proof, we need to show that v is also a lower bound on vali^. 
We define a strict partial order -< on states by setting s ^ t if one of the following 
conditions holds: 

1. s 1= 6 and t ^ b, 

2. vali^(s) < valip{t), or 

3. val(p(s) = valipit) and rew(s) > rew(t). 

Towards a contradiction, assume that the set C of states s with val^(s) < v{s) 
is non-empty, and pick a state s € C that is minimal with respect to -< (in 
particular, val^(s) < oo). Since s |= (p[oo] and the algorithm correctly sets v{s) 
to if s ^ 6, we know that s \^ a A ^b and vali^(s) > rew{s). Moreover, by 
Proposition |3] s \= if[val^{s)]. Let T be the set of all states t £ S \ Z such that 
val^{t) + rew{s) < valc^(s), i.e. t \= ip[val^{s) — rew(s)]. Note that T 7^ (because 
every state labelled with b is in T) and that t ^ s for all t G T. Since s is a minimal 
counter-example, we know that v(t) < vali^(i) < 00 for all t G T. Consequently, 
after some number of iterations of the while loop all elements of T have been 
added to X and the numbers v{t) have been added to R. Since R is empty upon 
termination, in a following iteration we have that r — max{?;(i) : t € T} and 
that T CY. Let x '■— val<p(s) — rew{s). Using Lemma [4] and some basic PRCTL* 
laws, we can conclude that s \= QX{Z U Y) as follows: 

s \= ^b A ip[valtp{s)] 

=^ s\= Qhb A {a U<x+rew{s) b)) 

=^ shOX(aU<,6) 

=^ s^QX{ZU {^Z A{aU<^b))) 

=^ s^QXiZUi^Z AQiaU<,b))) 
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s^QX{ZUT) 
=^ s^QX{ZUY) 

Since also s \= a, this means that s is added to X no later than in the current 
iteration. Hence, v{s) < r + rew{s) < val^(s), which contradicts our assumption 
that s € C. □ 

Theorem 6. Qualitative queries of the form Q(a U<7 b) can be evaluated in 
strongly polynomial time. 

Proof. By Lcmmajs] Algorithm [l] can be used to compute the values of Q{aU<-?b). 
During the execution of the algorithm, the running time of one iteration of the 
while loop is dominated by computing the set of states that fulfil the PCTL* 
formula QX(ZUy), which can be done in time 0{\5\) for Q e {VP>o, 3P>o, VP=i} 
and in time Od^l • \S\) for Q = 3P=i (see \T, Chapter 10]). In each iteration of 
the while loop, one element of R is removed, and the number of elements that 
are put into R in total is bounded by the number of states in the given MDP. 
Hence, the number of iterations is also bounded by the number of states, and the 
algorithm runs in time 0(151 • \S\) or 0(15^ • \5\), depending on Q. Finally, since 
the only arithmetic operation used by the algorithm is addition, the algorithm is 
strongly polynomial. □ 

Of course, queries of the form 3P>o(a U<7 b) can actually be evaluated in time 
OdS'p + \S\) using Dijkstra's algorithm since the value of a state with respect to 
such a query is just the weight of a shortest path from s via a-labeled states to a 
6- labelled state. 

Algorithm [T] also gives us a useful upper bound on the value of a state with 
respect to a qualitative query. 

Proposition 7. Let M be an MDP, if = Q(al}<-? b) a qualitative quantile query, 
n = |A^^(a)|, and c = max{rew{s) : s G A~^(a)}. Then val,^(s) < nc for all 
states s with val;^(s) < oo. 

Proof. By induction on the number of iterations Algorithm ^ performs before 
assigning a finite number to v{s). □ 

Finally, let us remark that our algorithm can be extended to handle queries of 
the form Q{a U>7 b), where a lower bound on the accumulated reward is sought. 
To this end, the initialisation step has to be extended to identify states with 
value — oo and the rule for discovering new states has to be modified slightly. 
We invite the reader to make the necessary modifications and to verify the 
correctness of the resulting algorithm. This proves that the fragment of PRCTL 
with probability thresholds and 1 and without reward constraints of the form 
= r can be model-checked in polynomial time. Previously, a polynomial-time 
algorithm was only known for the special case where the models are restricted to 
MDPs in which every loop contains a state with nonzero reward [IB] . 
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5 Evaluating quantitative queries 

In the following, we assume that all state rewards are natural numbers. This 
does not limit the applicability of our results since any MDP M. with non- 
negative rational numbers as state rewards can be converted efficiently to an 
MDP Ai' with natural rewards by multiplying all state rewards with the least 
common multiple K of all denominators occurring in state rewards. It follows 
that val;^ {s) — K ■ val^(s) for any quantile query ip and any state s of so 
in order to evaluate a quantile query on M. we can evaluate it on M' and 
divide by K. Throughout this section, we also assume that any transition 
probability and any probability threshold p occurring in a quantile query is 
rational. Finally, we define the size of an MDP M — {S,Act,'^,X,rew,S) to 
be \M\ := Es6sll'^ew(s)|| +T,^s,a,t)eS,aej(s)\\^is,a,t)\\, where ||a;|| denotes the 
length of the binary representation of x. 

5.1 Existential queries 

In order to solve queries of the form 3P>p(a U<? b) or 3P>p(a U<? b), we first 
show how to compute the maximal probabilities for fulfilling the path formula 
a\J<rb when we are given the reward bound r. Given an MDP A4, a,b Q AP and 
r € N, consider the following linear program over the variables Xsa for s € S and 
{0,l,...,r}: 

Minimise ^ Xg^i subject to 

Xs^i > for all s e S* and i < r, 

Xs,i — 1 for all s G \^^{b) and i < r, 

for all s G A^^(a), a G Act and rew{s) < i < r. 

This linear program is of size r ■ \M\, and it can be shown that setting Xi^s 
to maxo-Prg(a U<i b) yields the optimal solution. Hence, we can compute the 
numbers max^^ Pr^ (a U<i b) in time poly(r • |A^|). 

Our algorithm for computing the value of a state s wrt. a query of the form 
3P>p(a U<7 b) just computes the numbers maxo-Prs(a U<i b) for increasing i 
and stops as soon as this probability exceeds p. However, in order to make this 
algorithm work and to show that it does not take too much time, we need a 
bound on the value of s provided this value is not infinite. Such a bound can be 
derived from the following lemma, which resembles a result by Hansen et al., who 
gave a bound on the convergence rate of value iteration in concurrent reachability 
games [TT]. Our proof is technically more involved though, since we have to deal 
with paths that from some point onwards do not earn any more rewards. 

Lemma 8. Let M be an MDP where the denominator of each transition prob- 
ability is at most m, and let n — |A""'^(a)|, c — max{ rew(s) : s G A^^(a)} and 
r = kncm~" for some k G N^. Then max^- Pr^(a U fe) < max^ Pr^ ("^ U<,- b) + e"*^ 
for all s & S. 
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Proof. Without loss of generality, assume that all 6-labelled states are absorbing. 
Let us call a state s of At dead if ,s ^ VP=o(a U b), and denote by D the set of 
dead states. Note that s e .D for all states s with s |= A -^b. Finally, let r be 
a memoryless scheduler such that Pr^ (a \J b) — ma.x^ Pr^(a U b) for all states s, 
and denote by Z the set of all states s with s |= o A ^6 and rew{s) = 0. By 
the definition of D and Z, we have that Pr^ (o U<r {D V QZ) A a U 6) = for all 
s G S. Moreover, if s is not dead, then there must be a simple path from s to a 
6-labclk;cl state via a-labelled states in the Markov chain induced by r. Since any 
a- labelled state has reward at most c, this impHes that Prg(a U<„c b) > m~" for 
all non-dead states s. Now let tp be the path formula bV D\/ GZ. We claim that 
PrJ(^(a 0<r < for all states s. To prove this, let s E S. We first show 
that Pvl{aU<i+„ctp \ -.(aU<i^)) > m"" for alH e N with Frl{aU<^ip) < 1. Let 
X be the set of sequences xt G S* ■ S such that xt G {s € S \ D : s \= a A -^b}* , 
rew{x) < i and rew{xt) > i. It is easy to see that the set {xt ■ S'^ : xt E X} is a 
partition of the set of infinite sequences over S that violate a U<, V- Using the 
fact that T is memoryless, we can conclude that 



PrJ(a U<i+„c V' I -'{a U<i tp)) 

> Pr^(a U<i+„c b I ^(a U<i V)) 

= Pil{aU<i+ncbn X ■ S'^)/PtI{X ■ S-^) 

= ^ Pr;(aU<i+„e6na;t-5'^)/Pr^(X-5") 

xtex 

> J2 PrI(aU<„e6)-Pr:(a;t-5-)/Pr;(X.5-) 

> Yl m-^-p,:{xt-s-)/Pr:{x-sn 

xtex 
= m-". 

Now, applying this inequality successively, we get that Pr^(-i(a U<r ip)) < 

(1 - m-'')^: = (1 - m-")*^"" < e-'=. Finally, 

Pr;(a U 6) = Pr;(a U 6 A -(a U<^ (£> V GZ))) 
<Pr;(^(aU<^ (DVGZ))) 
<Pr;(^(aU<^ V) V(aU<^6)) 

< Pr;(-(a U<^ V)) + Prl(a U<^ 6) 

< e"*' + max^ Pig (a U<r b) 

for all s G S. Since Prg(a U 6) = max<jPr^(a U 6), this inequality proves the 
lemma. □ 

Given an MDP M and a,b G AP, we denote by M the MDP that arises 
from A4 by performing the following transformation: 
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1. In each state s, remove all actions a with X^tes ^i^^ ^) ' ma-Xcr Vv1(a U 6) < 
maxg- Pi's (a U h) from the set 7(5) of enabled actions. 

2. Label all states s such that s |= P=o(a U h) with 6. 

The following lemma (proved in the appendix) allows us to reduce the query 
3P>p(a U<7 h) to the qualitative query 3P=i(a U<? h) in the special case that 
p equals the optimal probability of fulfilling a U &. 

Lemma 9. Let M. he an MDP, ip ~ 3P>p(a U<? h) and (p = 3P=i(a U<7 b). 
Then val;^(s) = val^(s) for all states s of M with p = maxo-Prs(a U b). 

With the help of Lemmas |8] and [9] we can devise an upper bound for the 
value of any query whose value is finite. 

Lemma 10. Let M be an MDP where the denominator of each transition 
probability is at most m, ip = 3P|>p(a U<7 b) for \> G {>,>}, n — |A^^(a)|, 
c = ma.x{rew{s) : s E A^^(a)}, s G S, and q = maXo^Prs(a U b). Then at least 
one of the following statements holds: 

1- p > q and val|^(s) = 00. 

2. p = q, \> = > and vali^(s) < nc. 

3. p < q and yalip{s) < kncm"' , where k = max{— [ln(g — p)J , 1}. 

Proof. Clearly, if either t> = > and p > q or t> = > and p > q, then val<^(s) = 00, 
and 1. holds. Now assume that p — q and > = >. By Lemma [9j we have 
that val;^(s) = val;^(s). Hence, if val;^(s) = 00, then 1. holds. On the other 

hand, if val;^(s) < 00, then Proposition t] gives us that val^(s) < nc, and 
2. holds. Finally, if p < g, then let r :— Kncm^. By Lemma ^ we have that 
maXo.Prs(a U<r b) > q - e^*" > q~ eL'"(«~P)J > q - {q - p) = _p, i.e. s |= 
3P|>p(a U<r b). Hence, val;^(s) < r, and 3. holds. □ 

It follows from Lemma [TO] that we can compute the value of a state s wrt. 
a query Lp of the form 3P>p(a U<? b) as follows: First compute the maximal 
probability q of fulfilling a U 6 from s, which can be done in polynomial time. 
If p > g, we know that the value of s wrt. Lp must be infinite. Otherwise, 
val,p(s) < r := kncm"', where k = max{— [ln((7 — p)\, 1}, and we can find the 
least i such that maxo- Pr^(a U<i 6) > p by computing maxo- Pr^ (a U<i b) for all 
i € {0, 1, . . . , r}, which can be done in time poly(r • Since r is exponential 

in the number of states of the given MDP Ai, the running time of this algorithm 
is exponential in the size of A4. If ip is of the form 3P>p(a U<7 b), the algorithm 
is similar, but in the case that p = q, we compute Taaxg-Pr'^^a U<i b) for all 
i G {0, 1, . . . ,nc} in order to determine whether the value is infinite or one of 
these numbers i. 

Theorem 11. Queries of the form 3P>p(a U<7 b) or 3P^p(a U<7 b) can be 
evaluated in exponential time. 
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5.2 Universal queries 



In order to solve queries of the form VP>p(a U<7 6), we first show how to compute 
the minimal probabilities for fulfilling the path formula a U<r 6 when we are 
given the reward bound r. Given an MDP Ai, a, 6 G AP and r G N, consider the 
following linear program over the variables Xs^i for s (z S and i G {0, 1, . . . , r}: 

Maximise ^s,i subject to 

Xs^i < 1 for all s Cz S and i < r, 

Xs,i — for all s Cz S with s ^ VP>o(a U<i b) and i < r, 

for all s G \ \^^{b), a G Act and rew{s) < i < r. 

This program is of size r ■ \A4\, and it can be shown that setting Xi^g to 
miUcr Pr^ (a U<i 6) yields the optimal solution. Since the set of states s with 
s \= VP>o(a U<i b) can be computed in polynomial time (Theorem [6]) , this means 
that we can compute the numbers minCTPr^(a U<i b) in time poly(r • |A^|). The 
following lemma is the analogue of Lemma [8] for minimal probabilities. 

Lemma 12. Let M. be an MDP where the denominator of each transition prob- 
ability is at most m, and let n — |A^^(a)|, c — max{ rew(s) : s G A^^(a)} and 
r = kncm^" for some k G Then min^ Pi's ("^ U &) < min^- Pr^(a U<,- b) + 
for all s ^ S. 

Proof. Without loss of generality, assume that all 6-labelled states are absorbing. 
Let us call a state s oi M dull if s \= 3P=o(a U 6), and denote by D the set of dull 
states. Note that s G for all states s with s |= A ^6. If s is not dull, then it is 
easy to see that, for any scheduler cr, the probability of reaching a ^-labelled state 
from s in at most n steps (while seeing only a-labelled states before reaching a 
6-labelled state) is at least m~". Since any a-labelled state has reward at most c, 
we get that Pr^(a U<„c b) > m~" for all non-dull states s and all schedulers cr. In 
the following, denote by Z the set {s G S" : s ^ a A ^& and rew(s) — 0}, and let 
tjj be the path formula bV D V GZ. In the same way as in the proof of Lemma |8] 
we can infer that Pr^(^(a U<r V')) < for states s and all schedulers a. Now 
fix a scheduler r that minimises Pr^ (a U<r 6) for all s G S* and a scheduler a such 
that Pr^ (a U 6) = for all s G D. From r and a, we devise another scheduler r* 
by setting 




t{x) ifxeiS\D)*, 

(7{x2) if X ^ xi ■ X2 where xi G (S' \ D)* and X2 (z D ■ S* . 



Note that Prf (a U<^ (L» V GZ) A a U 6) = and Prf (a U<,. {D V GZ)) = 
Prlia U<r {D V GZ)) for all seS. Hence, 
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Prf (a U 6) Prf (a U 6 A -(a U<^ {D V GZ))) 

<Prf(^(aU<^ {DVGZ))) 
= Pr;(^(aU<^ (DVGZ))) 

< Pr;(^(aU<^?/') V (aU<^ b)) 

< Pr;(-(aU<,V^)) + Pr;(aU<, b) 

< e-'' +Pil{a()<rb) 

= e"*" + min^ Pr^ (a U<,. b) 

for all s € S. Since miner Pr^(a U b) < Pr^ (a U b), this inequality proves the 
lemma. □ 

With the help of Lemma |12| we can devise an upper bound for the value of a 
query of the form VP>p(a U<? b) in case this value is finite. 

Lemma 13. Let A4 be an MDP where the denominator of each transition prob- 
ability is < m, ip — VP>p(a U<7 b), n = |A^^(a)|, c — ma.x{rew{s) : s € A~^(a)}, 
s S , and q = lam^ Pr^(a U b). Then one of the following statements holds: 

1. p > q and val;^(s) — oo. 

2. p < q and val^{s) < kncm"' , w/iere fc = max{— [ln(g — p)J , 1}. 

Proof. Clearly, if p > q, then val;p(s) = oo, and 1. holds. On the other hand, if 



p < q, then let r :— kncm" . By Lemma 12 we have that mino-Prs(a U<r b) > 
q- e'^ > q~ eLi"('?-P)J > q - {q - p) = P, i-e. s \= VP>p(a U<r 6). 'Hence, 
val<p(s) < r, and 3. holds. □ 

As in the last section. Lemma [13] can be used to derive an exponential 
algorithm for computing the value of a state wrt. a query of the form VP>j,(aU<76). 

Theorem 14. Queries of the formyP^p{a\J<-fb) can be evaluated in exponential 
time. 

Regarding queries of the form VP>p(a U<7 b), we can compute the value of a 
state s whenever the probability mincrPrg(a U b) differs from p using the same 
algorithm. However, in the case that p = miuo- Pr^ (a U b) it is not clear how to 
bound the value of s. As the following example shows, the analogous bound of no 
for existential queries from Lemma |10| does not apply in this case. 

Example 15. Consider the MDP depicted in Fig. [l| where Act — {b, t]} and 
q € [0, 1[ is an arbitrary probability. A state's reward is depicted in its bottom half, 
and a transition from s to t labelled with a,p indicates that S{s, a, t) — p. Only 
transitions from non-absorbing states with nonzero probability and corresponding 
to enabled actions are shown. Assuming that every state is labelled with a but 
only S3 and S5 are labelled with &, it is easy to see that miucr Prs^(a U 5) = ^. 
Moreover, a quick calculation reveals that the value of state sq with respect to 
the query VP>i/2(a U<7 b) equals — [l/log2(7j. Since q can be chosen arbitrarily 
close to 1, this value can be made arbitrarily high. 
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5.3 A pseudo-polynomial algorithm for Markov chains 

In this section, we give a pseudo-polynomial algorithm for evaluating quantile 
queries of the form P|>p(a U<7 h) on Markov chains. (Note that the quantifiers 
3P and VP coincide for Markov chains.) More precisely, our algorithm runs in 
time poly(c • \ ■ \\p\\) if c is the largest reward in M.. As an important special 
case, our algorithm runs in polynomial time on Markov chains where each state 
has reward or 1. 

Our polynomial-time algorithm relies on the following equations for computing 
the probability of the event a U=,; in a Markov chain with rewards and 1. 
Given such a Markov chain and a G AP, we denote by Z the set of states s 
such that rew{s) = and s \= a A -^b. Then the following equations hold for all 
s e S, a,b e AP and r G N: 

~ Prs{aU=ob) =Pr,{ZUb), 

- Pr,(a U=2r b) ^ EtGS\zPr«(« U=. {t}) ■ Prt(a b), 

- Prs(aU=2r+i b) = EteA-i(a)\2E„esPrs(aU=r {t}) ■ S{t,u) •Pr„(aU=^ b), 

Using these equations, we can compute the numbers Prs(aU=r&) along the binary 
representation of r in time 0(poly(|Al|) • logr) for Markov chains with rewards 
and 1 (see also [12]) ■ Since any Markov chain A4 with rewards 0,1, ... ,c can 
easily be transformed into an equivalent Markov chain of size c - |A1| with rewards 
and 1, the same numbers can be computed in time 0(poly(c • |A1|) • logr) for 
general Markov chains. Finally, we can compute the numbers Prs(a U<r b) in the 
same time by first applying the following operations to each 6-labelled state s: 
Make s absorbing, add a to A(s), and set rew(s) = 1; in the resulting Markov 
chain each state s fulfils Pr5(a U<j. b) = Prs(a U^r b). 

Now let tp = P|>p(a U<7 b). Our algorithm for evaluating at state s of a 
Markov chain A4 is essentially the same algorithm as for MDPs. Hence, we first 
compute the probability q := Pr<;(a U b). If either p > q or p — q and [> = >, 
then val,^(s) = oo, by Lemma 10 li p < q, then the same lemma entails that 



val,^(s) < r := kncm", where n — |A~^(a)|, m is the least denominator of any 
transition probability, and k = max{— [ln{q — p)\ , 1} < po\y(M) + \\p\\. Hence, we 
can determine val,^(s) using an ordinary binary search in time 0(poly(c • \ A4\) ■ 
log^r) = 0(poly(c- |A1| • |lp|l)). Finally, the same method can be applied if p = g 



and > = > since Lemma 10 tells us that valip{s) < nc in this case. 



Theorem 16. Queries of the form P>p(aU<?6) or P>p(aU<76) can be evaluated 
in pseudo-polynomial time on Markov chains. 
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6 Conclusions 



Although many researchers presented algorithms and several sophisticated tech- 
niques for the PCTL model checking problem and to solve PCTL and PRCTL 
queries, the class of quantile-based queries has not yet been addressed in the 
model checking community. In this paper, we presented algorithms for qualita- 
tive and quantitative quantile queries of the form Pt^p{a U<? b) and their duals 
3Pixp(a U<? b). We established a polynomial algorithms for the qualitative case 
and exponential algorithms for all but one of the quantitative cases. Although 
the algorithms for the quantitative cases rely on a simple search algorithm for 
the quantile, the crucial feature is the bound we presented in Lemmas [8| and [T2} 
These bounds might be interesting also for other purposes. There are several 
open problems to be studied in future work. First, the precise complexity of 
quantitative quantile queries is unknown and more efficient algorithms might 
exist, despite the NP-hardness shown in [13]. Second, we concentrated here on 
reward-bounded until properties, and by duality our results also apply to reward- 
bounded release properties. But quantile queries can also be derived from other 
PCTL-like formulas, such as formulas reasoning about expected rewards, e.g. in 
combination with step bounds. 
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A Proof of Lemma |9] 

In the following, we denote by D the set of states s oi M such that s \= VP=o(flU5) 
and assume without loss of generality that each 6-labelled state in M. is absorbing. 
Given a scheduler a and sequence a; g 5*, we also define <j[x\ to be the scheduler 
such that (j[x\{y) — <7{xy) for all y € S* . Finally, we write Fa as an abbreviation 
for the path formula (^a U a). 

Now let s be a state of Ai such that p — maxg- Pr^ (a U b). Then it suffices to 
show that for all r e N we have A4,s \= (p[r] if and only if yM, s ^ i^[r]. 

(=>) Assume that A4,s \= (p[r]. Hence, there exists a scheduler a for Ai 
such that Pr^(a U<r b) = p. In particular, Pr^ (a \J b) — p, which implies that 
Pr^^ia U {bW D)) = 1. We claim that Pr^(a \J<r (bW D)) = Pr^(a U {bW D)). 
Otherwise there would exist xt e S* ■ S such that xt e {s e S \ D : s \= a A ^6}*, 
rew{xt) > r and Pr^ (xt ■ S") > 0. Since i ^ D and Pr^ (a U 6) = max^ Pr^(a U 6), 
we get that Pr^'^l(a U 6) > and therefore also Pr^ (a U &) > Pr^ (a U<r b), a 
contradiction. Finally, observe that a induces a scheduler a for A4 such that 
Pr*(a U<r b) = Pr*(a U 6) = 1, which proves that 7W, s |= (^[r]. 

(-^) Assume that A^,s ^ '^[?']- Hence, there is a scheduler a for M. with 
Pr^(a U<r = 1. This scheduler induces a scheduler a for such that 
Pr^^a U<r (6 V Z?)) = 1. Note that in M we have p = max^ Pr^(^Fi:'). (In par- 
ticular, the memoryless, randomised scheduler r* that in every state t uniformly 
chooses an action from all those actions that maximise the probability of staying 
in S" \ D has the property that Prf (a U 6) = Pr^'(^F-D) max^ Pr^(^FD).) 
Since a is derived from a scheduler for Al, this implies that, from any state t, 
a never chooses an action that does not maximise the probability of staying in 
S\D. But any such scheduler maximises the probability of never reaching D, i.e. 
Pr^(Fi:>) = l-max^PrH-FI?) = l-p. Hence, Pr^(aU<^fe) = l-Pr^(aU<^i:') > 
1 — Pr^(FZ3) — p, which proves that Ai, s |= f[r]. □ 
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